A comparison of two registry-based systems for the surveillance of persons hospitalised with COVID-19 in Norway, February 2020 to May 2022

Background The surveillance of persons hospitalised with COVID-19 has been essential to ensure timely and appropriate public health response. Ideally, surveillance systems should distinguish persons hospitalised with COVID-19 from those hospitalised due to COVID-19. Aim We compared data in two national electronic health registries in Norway to critically appraise and inform the further development of the surveillance of persons hospitalised with COVID-19. Method We included hospitalised COVID-19 patients registered in the Norwegian Patient Registry (NPR) or the Norwegian Pandemic Registry (NoPaR) with admission dates between 17 February 2020 and 1 May 2022. We linked patients, identified overlapping hospitalisation periods and described the overlap between the registries. We described the prevalence of International Classification of Diseases (ICD-10) diagnosis codes and their combinations by main cause of admission (clinically assessed as COVID-19 or other), age and time. Results In the study period, 19,486 admissions with laboratory-confirmed COVID-19 were registered in NoPaR and 21,035 with the corresponding ICD-10 code U07.1 in NPR. Up to late 2021, there was a 90–100% overlap between the registries, which thereafter decreased to < 75%. The prevalence of ICD-10 codes varied by reported main cause, age and time. Conclusion Changes in patient cohorts, virus characteristics and the management of COVID-19 patients from late 2021 impacted the registration of patients and coding practices in the registries. Using ICD-10 codes for the surveillance of persons hospitalised due to COVID-19 requires age- and time-specific definitions and ongoing validation to consider temporal changes in patient cohorts and virus characteristics.


Introduction
Coronavirus disease 2019 (COVID-19) is caused by infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and the spectrum of disease may range from asymptomatic infection to severe respiratory failure. Since the start of the COVID-19 pandemic in early 2020, the surveillance of persons hospitalised with COVID-19 has been essential to ensure timely and appropriate public health response. Data from this surveillance have informed understanding of the trend and severity of the pandemic and also been used to study factors associated with severe disease [1][2][3][4][5][6][7]. Different approaches to this surveillance have emerged around Europe. Examples include data collection integrated with infectious disease notifications [2,8], systems based on pre-existing national patient registries [9][10][11], systems for the surveillance of severe acute respiratory infection (SARI) [8,12] and implementation of a voluntary patient-level clinical survey [13].
During the first 2 years of the COVID-19 pandemic, concern about healthcare capacity led to strict infection prevention and control measures worldwide. However, with high vaccine effectiveness against severe disease found to be more sustained than against infection [3,4] and the global spread of the less virulent Omicron variant (Phylogenetic Assignment of Named Global Outbreak (Pango) lineage designation B.1.1.529) from late 2021 [5][6][7], many countries scaled back nonpharmaceutical interventions and testing for SARS-CoV-2. These developments increased community transmission and reduced the proportion of COVID-19 cases diagnosed, but also reduced the proportion of cases who developed severe disease. This increased the importance of the surveillance of hospitalisation, not only with, but due to COVID-19, as a positive test for SARS-CoV-2 could be incidental in a larger proportion of hospitalised persons. Some countries have developed indicators for this surveillance [9,14].
Norway (population 5.4 million) confirmed its first case of COVID-19 in February 2020 and has experienced several waves of SARS-CoV-2 infection, including those driven by the Alpha (Pango lineage designation B.1.1.7), Delta (Pango lineage B.1.617.2) and Omicron variants [15]. Throughout the pandemic, Norway has been in the unique position of having two national electronic health registries (EHR) for the surveillance of persons hospitalised with COVID-19, one of which also disaggregates admissions due to COVID-19 based on a clinical assessment. To critically appraise and inform the further development of the surveillance of persons hospitalised with COVID-19, we linked and compared data in the two EHR from the first 2 years of the pandemic.

Data sources Norwegian Pandemic Registry (NoPaR)
The Norwegian Pandemic Registry (NoPaR) has been the primary data source in Norway for the surveillance of persons hospitalised with COVID-19. It is a national clinical registry established in March 2020 as an expansion of the Norwegian Intensive Care Registry [16]. Data on hospital stays for patients who test positive for SARS-CoV-2 by PCR are collected in NoPaR. Patients admitted with sequelae of COVID-19 are registered if they tested positive < 3 months before admission. Patients readmitted for non-COVID-19-related causes are not registered if they are not isolated. Outpatient visits are not registered [17]. All Norwegian hospitals report to NoPaR. Reporting is mandatory and reporting criteria in the study period were consistent. Data collected include demographic characteristics (age, sex, underlying comorbidities), time of admission and discharge, and clinical condition and treatment at admission and during the stay of the patient. The reported main cause of hospitalisation (COVID-19 or other) is the physician's clinical assessment. For patients with underlying comorbidities, COVID-19 is reported as the main cause if it contributed to a worsening of the underlying condition that necessitated hospitalisation. International Classification of Diseases (ICD) diagnosis codes [18] are not registered.

Norwegian Patient Registry (NPR)
The Norwegian Patient Registry (NPR) is a central health registry established in 1997 and contains data on all patients who are referred to or receive specialist healthcare at a hospital, outpatient clinic or contracted specialist in Norway [19]. All Norwegian hospitals report to NPR, and reporting is mandatory [20]. At discharge, at the latest, the ICD diagnosis codes are registered and related to the reimbursement claims of the hospitals. Throughout the pandemic, national guidelines have recommended the use of the ICD-10 code U07.1 (COVID-19, virus identified) when COVID-19 is laboratory-confirmed, regardless of the clinical illness of the patient, and that the code is registered in addition and secondary to relevant codes for the clinical illness (e.g. pneumonia) [21]. Throughout the pandemic and as of March 2023, national guidelines recommend PCR to confirm all patients who seek healthcare for COVID-19 and in all cases where confirmation is important for differential diagnosis and choice of treatment. The Norwegian surveillance system for communicable diseases laboratory database (MSIS-labdatabase) The Norwegian surveillance system for communicable diseases laboratory database (MSIS-labdatabase) [22] contains results for microbiological samples analysed for SARS-CoV-2 in all medical microbiological laboratories in Norway.

The national population registry
The national population registry includes demographic and administrative information on all individuals who reside or have resided in Norway [23]. This includes the national identification number, which was essential in our study for analyses that required linking registries.

Data access
We obtained data through the emergency preparedness registry for COVID-19 (Beredt C19), housed at the Norwegian Institute of Public Health. This registry contains individual-level, daily updated data from different central health registries, national clinical registries and other relevant national registries [24]. Owing to the principle of data minimisation, researchers in Beredt C19 are only given access to full ICD-10 codes from NPR that are necessary for performing the required analyses. The full ICD-10 codes we had access to are presented in Supplement 1. We extracted data on 12 May 2022.

Table 1
Defined time periods and public health measures taken during the COVID-19 pandemic, Norway, 17 February 2020-1 May 2022 Week/year Dominant circulating virus variant [15] COVID-19 vaccination programme [15] Summary of major national public health measures implemented during periods of peak transmission a [38] 9/2020-6/2021 Wild type/'Wuhan' Started week 52/2020 In mid-March 2020, public health measures were implemented, including the closure of preschools, schools and hospitality services and businesses with one-to-one customer contact, cancellation of cultural and sporting arrangements and closing of borders to non-residents. Cases, close contacts and travellers returning from areas with high transmission were obliged to quarantine. Hospitals were instructed to reduce normal operations and prepare for an influx of COVID-19 patients. The restrictions were gradually lifted from mid-April 2020, although some remained (e.g. quarantine for cases, close contacts and most travellers), as well as the general recommendations to stay home when sick, wash hands, limit social contact and maintain a 1 m distance.

Patient cohorts
We included patients registered in NPR or NoPaR with hospital admission dates between 17 February 2020 and 1 May 2022. From NPR, we included overnight stays (both urgent and elective admissions). We considered individual stays for the same patient with < 2 days between discharge and subsequent admission to be part of the same hospitalisation period, regardless of the main cause (NoPaR) or diagnosis codes (NPR) registered for subsequent admissions. An individual could thus have several COVID-19 hospitalisation periods if there were ≥ 2 days between stays. We calculated the age of the patients at the start of the hospitalisation period from birth dates in the national population register and defined four age groups (0-17, 18-54, 55-74 and ≥ 75 years). We defined four time periods which are presented in Table 1.

System coverage
We identified overlapping hospitalisation periods in NPR and NoPaR. To explore what proportion of all patients with a recent positive SARS-CoV-2 test were registered in NoPaR or with the ICD-10 code U07.1 in NPR, we also linked NPR with positive SARS-CoV-2 PCR tests in MSIS-labdatabase taken ≤ 14 days before admission until discharge. We chose 14 days to ensure we identified all patients with recent positive tests that could reasonably be expected to be registered with U07.1 or in NoPaR, while 14 days was also the cut-off used in comparable registry-based surveillance systems in other countries [9,14] and studies on variant severity [6]. To identify admissions with COVID-19 in NPR, we used three definitions: (i) positive PCR test,    (ii) U07.1 and (iii) positive PCR test and U07.1. For NoPaR, we included all admissions. We calculated the proportion of admissions with COVID-19 in NPR that overlapped with admissions in NoPaR and the proportion of admissions in NoPaR that overlapped with U07.1 admissions in NPR. Proportions are presented as 4-week moving averages. Among U07.1 patients, we also described the proportion with a positive PCR test > 14 days before admission or positive rapid antigen test ≤ 14 days before admission until discharge.

Hospitalisation due to COVID-19
To study the association between ICD-10 diagnosis codes and the clinical assessment of main cause of admission, we retrieved ICD-10 codes from NPR for the first overlapping hospitalisation period for each patient in NoPaR. We only included the first overlapping period, as the similarity of multiple hospitalisations for a particular patient could distort the distribution. The ICD-10 codes available included full codes on acute upper and lower respiratory infections (ARI), while for other codes only the first letter was available (Supplement 1). For J codes (diseases of the respiratory system), we grouped codes for pneumonia (J12-J18), other acute lower respiratory infections (J20-J22 and J80) and acute upper respiratory infections (URI) (J00-J06). We grouped the respiratory syncytial virus-specific codes for pneumonia (J12.1) and acute lower respiratory infections (J20.5 and J21.0) separately, as respiratory syncytial virus should be distinguished in an integrated (meaning covering several diseases) surveillance system for viral respiratory infections [25]. Other respiratory diseases (J codes, excluding J00-J22 and J80) were grouped according to the first letter of the diagnosis code (J (non-ARI)). We calculated the prevalence of all different ICD-10 codes and their combinations by reported main cause of admission (COVID-19 or other), age group and period. For efficiency, we used an a priori algorithm (R package arules [26]). For each age group or period, we also calculated the sensitivity and specificity of selected diagnosis code combinations for representing U07.1 patients' main cause of admission. We also present the trend in new admissions for patients with COVID-19 as main cause in NoPaR and selected ICD-10 code combinations in NPR using data aggregated separately from each registry. All analyses were conducted with R (version 4.0.2) [27].  (Figure 2).

Hospitalisation due to COVID-19
We included 18,009 overlapping first admissions in NoPaR and NPR, excluding 163 for which the reported main cause was unknown. The prevalence of different ICD-10 code combinations by main cause are shown in Figure 3. For both admissions with main cause COVID-19 (n = 11,803) and other (n = 6,206), U07.1 was the most common code registered. For admissions with COVID-19 as main cause, 7,976 (68%) were registered with a pneumonia code and 4,244 (36%) with a J (non-ARI) code. There was more variation in the ICD-10 codes for admissions with another main cause than COVID-19. Detailed data by age group and period are available in Supplement 1. Regardless of period or main cause, there was generally a greater prominence of a wider range of codes among older than younger age groups. For patients ≥ 75 years, pneumonia followed by J (non-ARI) codes were most common among those admitted with COVID-19 as main cause across all periods, although less prominent from week 52/2021. A similar pattern was observed among patients 18-54 and 55-74 years, with increased prominence of R (symptoms, signs and abnormal clinical and laboratory       The sensitivity and specificity of selected ICD-10 code combinations for representing patients' main cause of admission are presented in Table 2. For this analysis, among 18,009 first overlapping hospitalisation periods in NoPaR, we only included those with U07.1 registered (n = 16,523, 92%), to explore code combinations beyond U07.1. We explored combinations of pneumonia, J (non-ARI), URI and R codes based on those most prominent among patients admitted with COVID-19 as main cause in different age groups and periods. The prevalence of ICD-10 diagnosis codes and code combinations by main cause of admission, age group and period are provided in Supplement 1. The three code combinations including pneumonia had a higher sensitivity among patients ≥ 18 years than 0-17 years. Conversely, the combination URI or R had the lowest sensitivity among all age groups and periods, except patients 0-17 years. In the period from 52/2021 the  9  17  25  33  41  49  4  12  20  28  36  44  52  8  16 Year and week Year and week sensitivity of all code combinations including pneumonia was lower and the specificity higher, compared with earlier periods.
Using data aggregated separately from each registry, the code combination pneumonia or J (non-ARI) (n = 12,221) most closely followed the trend in new admissions with COVID-19 as main cause in NoPaR (n = 12,638), although the same general trend was observed in all code combinations, aside from URI or R before the end of the study period ( Figure 4).

Discussion
Despite different registration criteria in each registry, 90-100% of hospitalisation periods overlapped between U07.1 patients in NPR and patients in NoPaR, until late 2021 when the proportion of overlapping patients gradually decreased. Our results support previous studies in different settings that found high uptake and accurate use of U07.1 to identify COVID-19 patients earlier in the pandemic [28][29][30][31]. However, from late 2021, an increasing proportion of patients in NPR with a recent positive PCR test was not registered with U07.1, nor registered in NoPaR. Furthermore, over 1,600 U07.1 patients in NPR, predominantly admitted from late 2021, did not have a registered positive PCR test ≤ 60 days before admission until discharge. This suggests increasing registration of U07.1 for patients without a positive PCR test, contrary to national guidelines. Most U07.1 patients without a positive PCR test ≤ 60 days before admission until discharge did not have a recent rapid antigen test registered, however, self-tests are not registered in MSIS-labdatabase [22], thus these data must be interpreted with caution.
The decreasing overlap between the registries from late 2021 coincided with the Delta variant being superseded by the milder Omicron variant [5], increasing vaccination coverage [15] and the gradual scaling back of non-pharmaceutical interventions and testing in Norway. This consequently impacted the flow to, and management of, COVID-19-positive persons in hospital. Patients gradually became more spread out across hospitals, instead of being treated in specific wards usually under the care of infectious disease physicians.
Our results suggest that this could have impacted the registration of new patients in NoPaR and ICD-10 codes in NPR, such that the two registries were identifying increasingly different patient cohorts and a decreasing proportion of all patients with a recent positive PCR test.
Since the start of the pandemic, Norway has disaggregated data on patients with COVID-19 as main cause of hospitalisation through a clinical assessment. The benefit of this disaggregation was clearly illustrated in late 2021, when the proportion of all COVID-19 patients hospitalised with COVID-19 as main cause fell markedly [15]. A similar trend was observed in other countries where 'due to COVID-19' was defined by diagnosis codes [9,11,32]. In our study, while certain ICD-10 code combinations closely followed the trend in new admissions with COVID-19 as main cause, the distribution of ICD-10 codes varied by age and time. The greater prominence of a wider range of codes among older age groups likely reflect a higher rate of underlying comorbidities. From late 2021 the frequency of pneumonia codes decreased in the age groups ≥ 18 years, potentially related to the increasing proportion of vaccinated patients [33][34][35]. In the same period, the proportion of patients 0-17 years admitted due to COVID-19 who were registered with a URI code increased, in line with findings from elsewhere during a period of increasing Omicron dominance [36] and the lifting of restrictions. After week 52/2021, the sensitivity of all code combinations including pneumonia was lower, compared with earlier periods. Clinical validation of the algorithm for admissions 'due to COVID-19' in Denmark also found that sensitivity decreased between Delta (95%) and early Omicron (87%) periods [9]. This highlights the challenge of using diagnosis codes for the surveillance of persons hospitalised due to COVID-19, not least the importance of age-and time-specific definitions. Other definitions of 'due to COVID-19' have also been proposed. For example, Public Health Scotland revised their definition to community-acquired hospital admissions with a positive PCR test from emergency admissions, as data on clinical discharge diagnoses were not timely enough for ongoing surveillance [14]. Surveillance of SARI, either EHR-or questionnairebased, is an alternative standardised approach established in several European countries [25]. However, in the context of the surveillance of persons hospitalised due to COVID-19, one must consider how the definition of SARI may influence the sensitivity of the system. For example, in the EHR-based sentinel system in Germany, SARI is defined as patients admitted with ICD-10 codes J09-J22 [8]. This will miss patients admitted with an URI (J00-J06), which we observed in an increasing proportion of younger persons hospitalised due to COVID-19 since Omicron emerged.
Hospital admissions remain a central indicator for COVID-19 surveillance. Data collection should be sensitive and timely for public health action, representative, accurate, sustainable, collect relevant data on the patient cohort, integrated with, but able to distinguish between, different pathogens and not entail an unnecessary reporting burden. A diverse landscape of systems for this surveillance has emerged across Europe [2,[8][9][10][11][12][13], with designs naturally tailored to the local setting, resource-availability and existing data collection infrastructure and practices. Looking forward, a European Union Joint Action 'UNITED4Surveillance' now aims to promote the integration, digitalisation and establishment of real-time surveillance systems in the European Union, including through EHR [37]. year-round surveillance, can be linked to other national registries to achieve integration between surveillance systems (e.g. molecular surveillance) and healthcare levels and may monitor additional severity outcomes like intensive care admission (not available for comparison in this study). These systems have thus outdated the sentinel, manual and weekly reporting Norway had for hospitalisation with another respiratory infection (influenza) before the pandemic. The pandemic registry collects COVID-19-specific data and data reporting has been timely [15]. However, in its current form, data registration in NoPaR entails an additional burden for hospitals and has more limited capacity to be integrated with the surveillance of other respiratory infections, as desired [25]. Surveillance systems based on electronic patient registries, like NPR, have the advantage of being part of the established data flow in hospitals and allow the integration of surveillance for different pathogens as well as syndromic and disease-specific components. Through linkage to laboratory results, they may also provide more complete data on persons hospitalised with COVID-19, as well as data on total and disease-specific hospital bed occupancy, which remains a recommended COVID-19 surveillance indicator [25]. However, the accuracy and timeliness of coding practices and changes in coding practices over time, must be considered. Also, NPR provides limited disease-specific data, such as on clinical severity and treatment. Furthermore, endeavours to accurately identify persons hospitalised due to COVID-19 will require ongoing validation to consider temporal changes in patient cohorts and virus characteristics. A future surveillance system would ideally encompass the benefits of both NoPaR and NPR, something Norwegian registries have demonstrated the feasibility of during the pandemic. A coordinated system will probably improve coverage by reducing the need for reporting via several systems.
Our study has several limitations. Firstly, different ICD-10 code combinations were registered among patients in both main cause categories. This highlights that clinicians may assess the main cause of admission differently for patients with similar diagnostic codes, leading to non-differential misclassification. This could potentially be alleviated by including more main cause categories beyond COVID-19/other [9]. Secondly, we cannot rule out that the decreasing overlap between U07.1 patients in NPR and patients in NoPaR from late 2021 affected our analysis of hospitalisation due to COVID-19 and the precision of our sensitivity and specificity estimates. Also, we did not have access to full ICD-10 codes for all diagnostic categories, which limited the exploration of whether more detailed code combinations could more precisely represent persons hospitalised due to COVID-19. We also only considered the distribution of ICD-10 codes in this analysis. However, other parameters could additionally inform more precise proxies. For example, in Denmark, the proportion of admission time related to certain diagnosis codes is considered [9]. Finally, we cannot rule out errors in data registration that may have influenced which patients and hospitalisation periods were able to be correctly linked. However, given the high degree of overlap and that most individuals have only been hospitalised with COVID-19 once, we do not believe this has unduly affected our results.

Conclusion
In this study we compared data in two EHR in Norway on persons hospitalised with COVID-19 during the first 2 years of the pandemic. Our results show that the EHR provided an accurate picture of persons hospitalised with COVID-19, but also highlight challenges with using EHR data. This comparison has allowed more comprehensive understanding of the data in each EHR through different phases of the pandemic and can inform the ongoing development of surveillance systems for COVID-19 and in preparation for future pandemics.

Ethical statement
The national emergency preparedness register for COVID-19 (Beredt C19) was established under the Health Preparedness Act §2-4 in response to the COVID-19 pandemic. Under the Infectious Disease Control Act §7-9, the Norwegian Institute of Public Health is responsible for the surveillance of infectious diseases in Norway. Approval by an ethical review board was not considered necessary.

Funding statement
This research was undertaken as part of routine work at the Norwegian Institute of Public Health. No specific funding was received.

Data availability
The dataset analysed in the study contains individual-level linked data from national registries in Norway. The researchers had access to the data through Beredt C19, housed at the Norwegian Institute of Public Health. In Beredt C19, only fully anonymised data (i.e. data that are neither directly nor potentially indirectly identifiable) are permitted to be shared publicly. Legal restrictions therefore prevent the researchers from publicly sharing the dataset used in the study that would enable others to replicate the study findings. However, external researchers are freely able to request access to linked data from the same registries from outside the structure of Beredt C19, as per normal procedure for conducting health research on registry data in Norway. Further information on Beredt C19, including contact information for the Beredt C19 project manager, and information on access to data from each individual data source, is available at https://www.fhi.no/en/id/infectious-diseases/coronavirus/ emergency-preparedness-register-for-covid-19/.